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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Darzins, Aldis 

Mrachko, Gregory T. 

(ii) TITLE OF INVENTION: A Sphingomonas Biodesulf urizat ion 
Catalyst 

(iii) NUMBER OF SEQUENCES: 13 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Hamilton, Brook, Smith & Reynolds, P.C. 

(B) STREET: Two Militia Drive 

(C) CITY: Lexington 

(D) STATE: Massachusetts 

(E) COUNTRY: USA 

(F) ZIP : 02173 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/851,089 

(B) FILING DATE: 05-MAY-1997 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/835,292 

(B) FILING DATE: 07 -APR- 1997 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Elmore, Carolyn S. 

(B) REGISTRATION NUMBER: 37,567 

(C) REFERENCE/DOCKET NUMBER: EBC97-06A 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (781) 861-6240 

(B) TELEFAX: (781) 861-9540 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1362 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ix) FEATURE : 

(A) NAME /KEY : CDS 

(B) LOCATION: 1..1359 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

ATG ACC GAT CCA CGT CAG CTG CAC CTG GCC GGA TTC TTC TGT GCC GGC 4 8 

Met Thr Asp Pro Arg Gin Leu His Leu Ala Gly Phe Phe Cys Ala Gly 
1 5 10 15 

AAC GTC ACG CAC GCC CAC GGA GCG TGG CGC CAC GCC GAC GAC TCC AAC 96 

Asn Val Thr His Ala His Gly Ala Trp Arg His Ala Asp Asp Ser Asn 
20 25 30 

GGC TTC CTC ACC AAG GAG TAC TAC CAG CAG ATT GCC CGC ACG CTC GAG 144 

Gly Phe Leu Thr Lys Glu Tyr Tyr Gin Gin lie Ala Arg Thr Leu Glu 
35 40 45 

CGC GGC AAG TTC GAC CTG CTG TTC CTT CCC GAC GCG CTC GCC GTG TGG 192 

Arg Gly Lys Phe Asp Leu Leu Phe Leu Pro Asp Ala Leu Ala Val Trp 
50 55 60 

GAC AGC TAC GGC GAC AAT CTG GAG ACC GGT CTG CGG TAT GGC GGG CAA 24 0 

Asp Ser Tyr Gly Asp Asn Leu Glu Thr Gly Leu Arg Tyr Gly Gly Gin 
65 70 75 80 

GGC GCG GTG ATG CTG GAG CCC GGC GTA GTT ATC GCC GCG ATG GCC TCG 2 88 

Gly Ala Val Met Leu Glu Pro Gly Val Val lie Ala Ala Met Ala Ser 
85 90 95 

GTG ACC GAA CAT CTG GGG CTG GGC GCC ACC ATT TCC ACC ACC TAC TAC 336 

Val Thr Glu His Leu Gly Leu Gly Ala Thr lie Ser Thr Thr Tyr Tyr 

100 105 110 

CCG CCC TAC CAT GTA GCC CGG GTC GTC GCT TCG CTG GAC CAG CTG TCC 3 84 

Pro Pro Tyr His Val Ala Arg Val Val Ala Ser Leu Asp Gin Leu Ser 
115 120 125 

TCC GGG CGA GTG TCG TGG AAC GTG GTC ACC TCG CTC AGC AAT GCA GAG 432 

Ser Gly Arg Val Ser Trp Asn Val Val Thr Ser Leu Ser Asn Ala Glu 
130 135 140 

GCG CGC AAC TTC GGC TTC GAT GAA CAT CTC GAC CAC GAT GCC CGC TAC 4 80 

Ala Arg Asn Phe Gly Phe Asp Glu His Leu Asp His Asp Ala Arg Tyr 
145 150 155 160 

GAT CGC GCC GAT GAA TTC CTC GAG GTC GTG CGC AAG CTC TGG AAC AGC 52 8 

Asp Arg Ala Asp Glu Phe Leu Glu Val Val Arg Lys Leu Trp Asn Ser 
165 170 175 



-45- 

TGG GAT CGC GAT GCG CTG ACA CTC GAC AAG GCA ACC GGC CAG TTC GCC 576 

Trp Asp Arg Asp Ala Leu Thr Leu Asp Lys Ala Thr Gly Gin Phe Ala 
180 185 190 

GAT CCG GCT AAG GTG CGC TAC ATC GAC CAC CGC GGC GAA TGG CTC AAC 624 
Asp Pro Ala Lys Val Arg Tyr lie Asp His Arg Gly Glu Trp Leu Asn 
195 200 205 

GTA CGC GGG CCG CTT CAG GTG CCG CGC TCC CCC CAG GGC GAG CCT GTC 672 
Val Arg Gly Pro Leu Gin Val Pro Arg Ser Pro Gin Gly Glu Pro Val 
210 215 220 

ATT CTG CAG GCC GGG CTT TCG GCG CGG GGC AAG CGC TTC GCC GGG CGC 72 0 

lie Leu Gin Ala Gly Leu Ser Ala Arg Gly Lys Arg Phe Ala Gly Arg 
225 230 235 240 

TGG GCG GAC GCG GTG TTC ACG ATT TCG CCC AAT CTG GAC ATC ATG CAG 768 
Trp Ala Asp Ala Val Phe Thr lie Ser Pro Asn Leu Asp lie Met Gin 
245 250 255 

GCC ACG TAC CGC GAC ATA AAG GCG CAG GTC GAG GCC GCC GGA CGC GAT 816 
Ala Thr Tyr Arg Asp lie Lys Ala Gin Val Glu Ala Ala Gly Arg Asp 
260 265 270 

CCC GAG CAG GTC AAG GTG TTT GCC GCG GTG ATG CCG ATC CTC GGC GAG 864 
Pro Glu Gin Val Lys Val Phe Ala Ala Val Met Pro lie Leu Gly Glu 
275 280 285 

ACC GAG GCG ATC GCC AGG CAG CGT CTC GAA TAC ATA AAT TCG CTG GTG 912 
Thr Glu Ala lie Ala Arg Gin Arg Leu Glu Tyr lie Asn Ser Leu Val 
290 295 300 

CAT CCC GAA GTC GGG CTT TCT ACG TTG TCC AGC CAT GTC GGG GTC AAC 960 
His Pro Glu Val Gly Leu Ser Thr Leu Ser Ser His Val Gly Val Asn 
305 310 315 320 

CTT GCC GAC TAT. TCG CTC GAT ACC CCG CTG ACC GAG GTC CTG GGC GAT 10 08 

Leu Ala Asp Tyr Ser Leu Asp Thr Pro Leu Thr Glu Val Leu Gly Asp 
325 330 335 

CTC GCC CAG CGC AAC GTG CCC ACC CAA CTG GGC ATG TTC GCC AGG ATG 1056 
Leu Ala Gin Arg Asn Val Pro Thr Gin Leu Gly Met Phe Ala Arg Met 
340 345 350 

TTG CAG GCC GAG ACG CTG ACC GTG GGA GAA ATG GGC CGG CGT TAT GGC 1104 
Leu Gin Ala Glu Thr Leu Thr Val Gly Glu Met Gly Arg Arg Tyr Gly 
355 360 365 

GCC AAC GTG GGC TTC GTC CCG CAG TGG GCG GGA ACC CGC GAG CAG ATC 1152 
Ala Asn Val Gly Phe Val Pro Gin Trp Ala Gly Thr Arg Glu Gin lie 
370 375 380 
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GCG GAC CTG ATC GAG ATC CAT TTC AAG GCC GGC GGC GCC GAT GGC TTC 12 00 

Ala Asp Leu lie Glu lie His Phe Lys Ala Gly Gly Ala Asp Gly Phe 
385 390 395 400 

ATC ATC TCG CCG GCG TTC CTG CCC GGA TCT TAC GAG GAA TTC GTC GAT 124 8 

lie lie Ser Pro Ala Phe Leu Pro Gly Ser Tyr Glu Glu Phe Val Asp 
405 410 415 

CAG GTG GTG CCC ATC CTG CAG CAC CGC GGA CTG TTC CGC ACT GAT TAC 12 96 

Gin Val Val Pro lie Leu Gin His Arg Gly Leu Phe Arg Thr Asp Tyr 
420 425 430 

GAA GGC CGC ACC CTG CGC AGC CAT CTG GGA CTG CGT GAA CCC GCA TAC 1344 
Glu Gly Arg Thr Leu Arg Ser His Leu Gly Leu Arg Glu Pro Ala Tyr 
435 440 445 

CTG GGA GAG TAC GCA TGA 13 62 

Leu Gly Glu Tyr Ala 
450 



(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 453 amino acids 

(B) TYPE: amino acid 
(D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Met Thr Asp Pro Arg Gin Leu His Leu Ala Gly Phe Phe Cys Ala Gly 
15 10 15 

Asn Val Thr His Ala His Gly Ala Trp Arg His Ala Asp Asp Ser Asn 
20 25 30 

Gly Phe Leu Thr Lys Glu Tyr Tyr Gin Gin lie Ala Arg Thr Leu Glu 
35 40 45 

Arg Gly Lys Phe Asp Leu Leu Phe Leu Pro Asp Ala Leu Ala Val Trp 
50 55 60 

Asp Ser Tyr Gly Asp Asn Leu Glu Thr Gly Leu Arg Tyr Gly Gly Gin 
65 70 75 80 

Gly Ala Val Met Leu Glu Pro Gly Val Val lie Ala Ala Met Ala Ser 
85 90 95 



Val Thr Glu His Leu Gly Leu Gly Ala Thr lie Ser Thr Thr Tyr Tyr 
100 105 110 
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Pro Pro Tyr His Val Ala Arg Val Val Ala Ser Leu Asp Gin Leu Ser 
115 120 125 

Ser Gly Arg Val Ser Trp Asn Val Val Thr Ser Leu Ser Asn Ala Glu 
130 135 140 

Ala Arg Asn Phe Gly Phe Asp Glu His Leu Asp His Asp Ala Arg Tyr 
145 150 155 160 

Asp Arg Ala Asp Glu Phe Leu Glu Val Val Arg Lys Leu Trp Asn Ser 
165 170 175 

Trp Asp Arg Asp Ala Leu Thr Leu Asp Lys Ala Thr Gly Gin Phe Ala 
180 185 190 

Asp Pro Ala Lys Val Arg Tyr He Asp His Arg Gly Glu Trp Leu Asn 
195 200 205 

Val Arg Gly Pro Leu Gin Val Pro Arg Ser Pro Gin Gly Glu Pro Val 
210 215 220 

He Leu Gin Ala Gly Leu Ser Ala Arg Gly Lys Arg Phe Ala Gly Arg 
225 230 235 240 

Trp Ala Asp Ala Val Phe Thr He Ser Pro Asn Leu Asp He Met Gin 
245 250 255 

Ala Thr Tyr Arg Asp He Lys Ala Gin Val Glu Ala Ala Gly Arg Asp 
260 265 270 

Pro Glu Gin Val Lys Val Phe Ala Ala Val Met Pro He Leu Gly Glu 
275 280 285 

Thr Glu Ala He Ala Arg Gin Arg Leu Glu Tyr He Asn Ser Leu Val 
290 295 300 

His Pro Glu Val Gly Leu Ser Thr Leu Ser Ser His Val Gly Val Asn 
305 310 315 320 

Leu Ala Asp Tyr Ser Leu Asp Thr Pro Leu Thr Glu Val Leu Gly Asp 
325 330 335 

Leu Ala Gin Arg Asn Val Pro Thr Gin Leu Gly Met Phe Ala Arg Met 
340 345 350 

Leu Gin Ala Glu Thr Leu Thr Val Gly Glu Met Gly Arg Arg Tyr Gly 
355 360 365 

Ala Asn Val Gly Phe Val Pro Gin Trp Ala Gly Thr Arg Glu Gin He 
370 375 380 

Ala Asp Leu He Glu He His Phe Lys Ala Gly Gly Ala Asp Gly Phe 
385 390 395 400 
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Ile lie Ser Pro Ala Phe Leu Pro Gly Ser Tyr Glu Glu Phe Val Asp 
405 410 415 

Gin Val Val Pro lie Leu Gin His Arg Gly Leu Phe Arg Thr Asp Tyr 
420 425 430 

Glu Gly Arg Thr Leu Arg Ser His Leu Gly Leu Arg Glu Pro Ala Tyr 
435 440 445 

Leu Gly Glu Tyr Ala 
450 

(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1110 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..1107 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

ATG ACG ACA GAC ATC CAC CCG GCG AGC GCC GCA TCG TCG CCG GCG GCG 4 8 

Met Thr Thr Asp lie His Pro Ala Ser Ala Ala Ser Ser Pro Ala Ala 
15 10 15 

CGC GCG ACG ATC ACC TAC AGC AAC TGC CCC GTG CCT AAT GCC CTG CTC 96 
Arg Ala Thr lie Thr Tyr Ser Asn Cys Pro Val Pro Asn Ala Leu Leu 
20 25 30 

GCC GCG CTC GGC TCA GGT ATT CTG GAC AGT GCC GGG ATC. ACA CTT GCC 144 
Ala Ala Leu Gly Ser Gly lie Leu Asp Ser Ala Gly lie Thr Leu Ala 
35 40 45 

CTG CTG ACC GGA AAG CAG GGC GAG GTG CAC TTC ACC TAC GAC CGA GAT 192 
Leu Leu Thr Gly Lys Gin Gly Glu Val His Phe Thr Tyr Asp Arg Asp 
50 55 60 

GAC TAC ACC CGC TTC GGC GGC GAG ATT CCG CCG CTG GTC AGC GAG GGA 24 0 

Asp Tyr Thr Arg Phe Gly Gly Glu lie Pro Pro Leu Val Ser Glu Gly 
65 70 75 80 

CTG CGT GCG CCG GGG CGG ACC CGC CTG CTG GGA CTG ACG CCG GTG CTG 2 88 

Leu Arg Ala Pro Gly Arg Thr Arg Leu Leu Gly Leu Thr Pro Val Leu 
85 90 95 
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GGC CGC TGG GGC TAC TTC GTC CGG GGC GAC AGC GCG ATC CGC ACC CCG 3 36 

Gly Arg Trp Gly Tyr Phe Val Arg Gly Asp Ser Ala lie Arg Thr Pro 
100 105 110 

GCC GAT CTT GCC GGC CGC CGC GTC GGA GTA TCC GAT TCG GCC AGG AGG 3 84 

Ala Asp Leu Ala Gly Arg Arg Val Gly Val Ser Asp Ser Ala Arg Arg 
115 120 125 

ATA TTG ACC GGA AGG CTG GGC GAC TAC CGC GAA CTT GAT CCC TGG CGG 4 32 

lie Leu Thr Gly Arg Leu Gly Asp Tyr Arg Glu Leu Asp Pro Trp Arg 
130 135 140 

CAG ACC CTG GTC GCG CTG GGG ACA TGG GAG GCG CGT GCC TTG CTG AGC 4 80 

Gin Thr Leu Val Ala Leu Gly Thr Trp Glu Ala Arg Ala Leu Leu Ser 
145 150 155 160 

ACG CTC GAG ACG GCG GGG CTT GGC GTC GGC GAC GTC GAG CTG ACG CGC 52 8 

Thr Leu Glu Thr Ala Gly Leu Gly Val Gly Asp Val Glu Leu Thr Arg 
165 170 175 

ATC GAG AAC CCG TTC GTC GAC GTG CCG ACC GAA CGA CTG CAT GCC GCC 576 
lie Glu Asn Pro Phe Val Asp Val Pro Thr Glu Arg Leu His Ala Ala 
180 185 190 

GGC TCG CTC AAA GGA ACC GAC CTG TTC CCC GAC GTG ACC AGC CAG CAG 624 
Gly Ser Leu Lys Gly Thr Asp Leu Phe Pro Asp Val Thr Ser Gin Gin 
195 200 205 

GCC GCA GTC CTT GAG GAT GAG CGC GCC GAC GCC CTG TTC GCG TGG CTT 672 
Ala Ala Val Leu Glu Asp Glu Arg Ala Asp Ala Leu Phe Ala Trp Leu 
210 215 220 

CCC TGG GCG GCC GAG CTC GAG ACC CGC ATC GGT GCA CGG CCG GTC CTA 72 0 

Pro Trp Ala Ala Glu Leu Glu Thr Arg lie Gly Ala Arg Pro Val Leu 
225 230 235 240 

GAC CTC AGC GCA GAC GAC CGC AAT GCC TAT GCG AGC ACC TGG ACG GTG 768 
Asp Leu Ser Ala Asp Asp Arg Asn Ala Tyr Ala Ser Thr Trp Thr Val 
245 250 255 

AGC GCC GAG CTG GTG GAC CGG CAG CCC GAA CTG GTG CAG CGG CTC GTC 816 
Ser Ala Glu Leu Val Asp Arg Gin Pro Glu Leu Val Gin Arg Leu Val 
260 265 270 

GAT GCC GTG GTG GAT GCA GGG CGG TGG GCC GAG GCC AAT GGC GAT GTC 8 64 

Asp Ala Val Val Asp Ala Gly Arg Trp Ala Glu Ala Asn Gly Asp Val 
275 280 285 

GTC TCC CGC CTG CAC GCC GAT AAC CTC GGT GTC AGT CCC GAA AGC GTC 912 
Val Ser Arg Leu His Ala Asp Asn Leu Gly Val Ser Pro Glu Ser Val 
290 295 300 
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CGC CAG GGA TTC GGA GCC GAT TTT CAC CGC CGC CTG ACG CCG CGG CTC 960 

Arg Gin Gly Phe Gly Ala Asp Phe His Arg Arg Leu Thr Pro Arg Leu 
305 310 315 320 

GAC AGC GAT GCT ATC GCC ATC CTG GAG CGT ACT CAG CGG TTC CTG AAG 100 8 

Asp Ser Asp Ala lie Ala lie Leu Glu Arg Thr Gin Arg Phe Leu Lys 
325 330 335 

GAT GCG AAC CTG ATC GAT CGG TCG TTG GCG CTC GAT CGG TGG GCT GCA 105 6 

Asp Ala Asn Leu lie Asp Arg Ser Leu Ala Leu Asp Arg Trp Ala Ala 
340 345 350 

CCT GAA TTC CTC GAA CAA AGT CTC TCA CGC CAG GTC GAA GGG CAG ATA 1104 
Pro Glu Phe Leu Glu Gin Ser Leu Ser Arg Gin Val Glu Gly Gin He 
355 360 365 

GCA TGA 1110 
Ala 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 69 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Met Thr Thr Asp He His Pro Ala Ser Ala Ala Ser Ser Pro Ala Ala 
15 10 15 

Arg Ala Thr He Thr Tyr Ser Asn Cys Pro Val Pro Asn Ala Leu Leu 
20 25 30 

Ala Ala Leu Gly Ser Gly He Leu Asp Ser Ala Gly He Thr Leu Ala 
35 40 45 

Leu Leu Thr Gly Lys Gin Gly Glu Val His Phe Thr Tyr Asp Arg Asp 
50 55 60 

Asp Tyr Thr Arg Phe Gly Gly Glu He Pro Pro Leu Val Ser Glu Gly 
65 70 75 80 

Leu Arg Ala Pro Gly Arg Thr Arg Leu Leu Gly Leu Thr Pro Val Leu 
85 90 95 



Gly Arg 



Trp Gly Tyr Phe Val Arg Gly 
100 105 



Asp Ser Ala He Arg Thr Pro 
110 
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Ala Asp Leu Ala Gly Arg Arg Val Gly Val Ser Asp Ser Ala Arg Arg 
115 120 125 

He Leu Thr Gly Arg Leu Gly Asp Tyr Arg Glu Leu Asp Pro Trp Arg 
130 135 140 

Gin Thr Leu Val Ala Leu Gly Thr Trp Glu Ala Arg Ala Leu Leu Ser 
145 150 155 160 

Thr Leu Glu Thr Ala Gly Leu Gly Val Gly Asp Val Glu Leu Thr Arg 
165 170 175 

He Glu Asn Pro Phe Val Asp Val Pro Thr Glu Arg Leu His Ala Ala 
180 185 190 

Gly Ser Leu Lys Gly Thr Asp Leu Phe Pro Asp Val Thr Ser Gin Gin 
195 200 205 

Ala Ala Val Leu Glu Asp Glu Arg Ala Asp Ala Leu Phe Ala Trp Leu 
210 215 220 

Pro Trp Ala Ala Glu Leu Glu Thr Arg He Gly Ala Arg Pro Val Leu 
225 230 235 240 

Asp Leu Ser Ala Asp Asp Arg Asn Ala Tyr Ala Ser Thr Trp Thr Val 
245 250 255 

Ser Ala Glu Leu Val Asp Arg Gin Pro Glu Leu Val Gin Arg Leu Val 
260 265 270 

Asp Ala Val Val Asp Ala Gly Arg Trp Ala Glu Ala Asn Gly Asp Val 
275 280 285 

Val Ser Arg Leu His Ala Asp Asn Leu Gly Val Ser Pro Glu Ser Val 
290 295 300 

Arg Gin Gly Phe Gly Ala Asp Phe His Arg Arg Leu Thr Pro Arg Leu 
305 310 315 320 

Asp Ser Asp Ala He Ala He Leu Glu Arg Thr Gin Arg Phe Leu Lys 
325 330 335 

Asp Ala Asn Leu lie Asp Arg Ser Leu Ala Leu Asp Arg Trp Ala Ala 
340 345 350 

Pro Glu Phe Leu Glu Gin Ser Leu Ser Arg Gin Val Glu Gly Gin He 
355 360 365 



Ala 



(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1236 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE : 

(A) NAME /KEY : CDS 

(B) LOCATION: 1. . 1236 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

ATG AAC GAA CTC GTC AAA GAT CTC GGC CTC AAT CGA TCC GAT CCG ATC 
Met Asn Glu Leu Val Lys Asp Leu Gly Leu Asn Arg Ser Asp Pro lie 
15 10 15 

GGC GCT GTG CGG CGA CTG GCC GCG CAG TGG GGG GCC ACC GCT GTT GAT 
Gly Ala Val Arg Arg Leu Ala Ala Gin Trp Gly Ala Thr Ala Val Asp 
20 25 30 

CGG GAC CGG GCC GGC GGA TCG GCA ACC GCC GAA CTC GAT CAA CTG CGC 
Arg Asp Arg Ala Gly Gly Ser Ala Thr Ala Glu Leu Asp Gin Leu Arg 
35 40 45 

GGC AGC GGC CTG CTC TCG CTG TCC ATT CCC GCC GCA TAT GGC GGC TGG 
Gly Ser Gly Leu Leu Ser Leu Ser lie Pro Ala Ala Tyr Gly Gly Trp 
50 55 60 

GGC GCC GAC TGG CCA ACG ACT CTG GAA GTT ATC CGC GAA GTC GCA ACG 
Gly Ala Asp Trp Pro Thr Thr Leu Glu Val lie Arg Glu Val Ala Thr 
65 70 75 80 

GTG GAC GGA TCG CTG GCG CAT CTA TTC GGC TAC CAC CTC GGC TGC GTA 
Val Asp Gly Ser Leu Ala His Leu Phe Gly Tyr His Leu Gly Cys Val 
85 90 95 

CCG ATG ATC GAG CTG TTC GGC TCG GCG CCA CAA AAG GAA CGG CTG TAC 
Pro Met lie Glu Leu Phe Gly Ser Ala Pro Gin Lys Glu Arg Leu Tyr 
100 105 110 

CGC CAG ATC GCA AGC CAT GAT TGG CGG GTC GGG AAT GCG TCG AGC GAA 
Arg Gin lie Ala Ser His Asp Trp Arg Val Gly Asn Ala Ser Ser Glu 
115 120 125 



AAC AAC AGC CAC GTG CTC GAG TGG AAG CTT GCC GCC ACC GCC GTC GAT 
Asn Asn Ser His Val Leu Glu Trp Lys Leu Ala Ala Thr Ala Val Asp 
130 ' 135 140 



• 



-53- 

GAT GGC GGG TTC GTC CTC AAC GGC GCG AAG CAC TTC TGC AGC GGC GCC 48 0 

Asp Gly Gly Phe Val Leu Asn Gly Ala Lys His Phe Cys Ser Gly Ala 
145 150 155 160 

AAA AGC TCC GAC CTG CTC ATC GTG TTC GGC GTG ATC CAG GAC GAA TCC 52 8 

Lys Ser Ser Asp Leu Leu lie Val Phe Gly Val lie Gin Asp Glu Ser 
165 170 175 

CCC CTG CGC GGC GCG ATC ATC ACC GCG GTC ATT CCC ACC GAC CGG GCC 576 
Pro Leu Arg Gly Ala He He Thr Ala Val He Pro Thr Asp Arg Ala 
180 185 190 

GGT GTT CAG ATC AAT GAC GAC TGG CGC GCA ATC GGG ATG CGC CAG ACC 624 
Gly Val Gin He Asn Asp Asp Trp Arg Ala He Gly Met Arg Gin Thr 
195 200 205 

GAC AGC GGC AGC GCC GAA TTT CGC GAC GTC CGA GTC TAC CCA GAC GAG 672 
Asp Ser Gly Ser Ala Glu Phe Arg Asp Val Arg Val Tyr Pro Asp Glu 
210 215 220 

ATC TTG GGG GCA CCA AAC TCA GTC GTT GAG GCG TTC GTG ACA AGC AAC 72 0 

He Leu Gly Ala Pro Asn Ser Val Val Glu Ala Phe Val Thr Ser Asn 
225 230 235 240 

CGC GGC AGC CTG TGG ACG CCG GCG ATT CAG TCG ATC TTC TCG AAC GTT 76 8 

Arg Gly Ser Leu Trp Thr Pro Ala He Gin Ser He Phe Ser Asn Val 
245 250 255 

TAT CTG GGG CTC GCG CGT GGC GCG CTC GAG GCG GCA GCG GAT TAC ACC 816 
Tyr Leu Gly Leu Ala Arg Gly Ala Leu Glu Ala Ala Ala Asp Tyr Thr 
260 265 270 

CGG ACC CAG AGC CGC CCC TGG ACA CCC GCC GGC GTG GCG AAG GCG ACA 8 64 

Arg Thr Gin Ser Arg Pro Trp Thr Pro Ala Gly Val Ala Lys Ala Thr 
275 280 285 

GAG GAT CCC CAC ATC ATC GCC ACC TAC GGT GAA CTG GCG ATC GCG CTC 912 
Glu Asp Pro His He He Ala Thr Tyr Gly Glu Leu Ala He Ala Leu 
290 295 300 

CAG GGC GCC GAG GCG GCC GCG CGC GAG GTC GCG GCC CTG TTG CAA CAG 960 
Gin Gly Ala Glu Ala Ala Ala Arg Glu Val Ala Ala Leu Leu Gin Gin 
305 310 315 320 

GCG TGG GAC AAG GGC GAT GCG GTG ACG CCC GAA GAG CGC GGC CAG CTG 10 0 8 

Ala Trp Asp Lys Gly Asp Ala Val Thr Pro Glu Glu Arg Gly Gin Leu 
325 330 335 

ATG GTG AAG GTT TCG GGT GTG AAG GCC CTC TCG ACG AAG GCC GCC CTC 1056 
Met Val Lys Val Ser Gly Val Lys Ala Leu Ser Thr Lys Ala Ala Leu 
340 345 350 
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GAC ATC ACC AGC CGT ATT TTC GAG ACA ACG GGC TCG CGA TCG ACG CAT 1104 
Asp lie Thr Ser Arg lie Phe Glu Thr Thr Gly Ser Arg Ser Thr His 
355 360 365 

CCC AGA TAC GGA TTC GAT CGG TTC TGG CGT AAC ATC CGG ACT CAT ACG 1152 
Pro Arg Tyr Gly Phe Asp Arg Phe Trp Arg Asn lie Arg Thr His Thr 
370 375 380 

CTG CAC GAT CCG GTA TCG TAT AAA ATC GTC GAT GTG GGG AAC TAC ACG 12 00 

Leu His Asp Pro Val Ser Tyr Lys lie Val Asp Val Gly Asn Tyr Thr 
385 390 395 400 

CTC AAC GGG ACA TTC CCG GTT CCC GGA TTT ACG TCA 12 3 6 

Leu Asn Gly Thr Phe Pro Val Pro Gly Phe Thr Ser 
405 410 



(2) INFORMATION FOR SEQ ID NO : 6 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 412 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 



Met Asn Glu Leu 
1 

Gly Ala Val Arg 
20 

Arg Asp Arg Ala 
35 

Gly Ser Gly Leu 
50 

Gly Ala Asp Trp 
65 

Val Asp Gly Ser 



Pro Met lie Glu 
100 

Arg Gin He Ala 
115 

Asn Asn Ser His 
130 



Val Lys Asp Leu 
5 

Arg Leu Ala Ala 



Gly Gly Ser Ala 
40 

Leu Ser Leu Ser 
55 

Pro Thr Thr Leu 
70 

Leu Ala His Leu 
85 

Leu Phe Gly Ser 



Ser His Asp Trp 
120 

Val Leu Glu Trp 
135 



Gly Leu Asn Arg 
10 

Gin Trp Gly Ala 
25 

Thr Ala Glu Leu 



He Pro Ala Ala 
60 

Glu Val He Arg 
75 

Phe Gly Tyr His 
90 

Ala Pro Gin Lys 
105 

Arg Val Gly Asn 



Lys Leu Ala Ala 
140 



Ser Asp Pro He 
15 

Thr Ala Val Asp 
30 

Asp Gin Leu Arg 
45 

Tyr Gly Gly Trp 



Glu Val Ala Thr 
80 

Leu Gly Cys Val 
95 

Glu Arg Leu Tyr 
110 

Ala Ser Ser Glu 
125 

Thr Ala Val Asp 
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Asp Gly Gly Phe Val Leu Asn Gly Ala Lys His Phe Cys Ser Gly Ala 
145 150 155 160 

Lys Ser Ser Asp Leu Leu lie Val Phe Gly Val He Gin Asp Glu Ser 
165 170 175 

Pro Leu Arg Gly Ala He He Thr Ala Val He Pro Thr Asp Arg Ala 
180 185 190 

Gly Val Gin He Asn Asp Asp Trp Arg Ala He Gly Met Arg Gin Thr 
195 200 205 

Asp Ser Gly Ser Ala Glu Phe Arg Asp Val Arg Val Tyr Pro Asp Glu 
210 215 220 

He Leu Gly Ala Pro Asn Ser Val Val Glu Ala Phe Val Thr Ser Asn 
225 230 235 240 

Arg Gly Ser Leu Trp Thr Pro Ala He Gin Ser He Phe Ser Asn Val 
245 250 255 

Tyr Leu Gly Leu Ala Arg Gly Ala Leu Glu Ala Ala Ala Asp Tyr Thr 
260 265 270 

Arg Thr Gin Ser Arg Pro Trp Thr Pro Ala Gly Val Ala Lys Ala Thr 
275 280 285 

Glu Asp Pro His He He Ala Thr Tyr Gly Glu Leu Ala He Ala Leu 
290 295 300 

Gin Gly Ala Glu Ala Ala Ala Arg Glu Val Ala Ala Leu Leu Gin Gin 
305 310 315 320 

Ala Trp Asp Lys Gly Asp Ala Val Thr Pro Glu Glu Arg Gly Gin Leu 
325 330 335 

Met Val Lys Val Ser Gly Val Lys Ala Leu Ser Thr Lys Ala Ala Leu 
340 345 350 

Asp He Thr Ser Arg He Phe Glu Thr Thr Gly Ser Arg Ser Thr His 
355 360 365 

Pro Arg Tyr Gly Phe Asp Arg Phe Trp Arg Asn He Arg Thr His Thr 
370 375 380 

Leu His Asp Pro Val Ser Tyr Lys He Val Asp Val Gly Asn Tyr Thr 
385 390 395 400 

Leu Asn Gly Thr Phe Pro Val Pro Gly Phe Thr Ser 



405 



410 



J o 
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(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 2 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

Thr Thr Asp lie His Pro Ala Ser Ala Ala Ser Ser Pro Ala Ala Arg 
1 5 10 15 

Ala Thr lie Thr Tyr Ser 
20 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
ACNGAYATHC AYCCNGC 17 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 453 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 



Met Thr Gin Gin Arg Gin Met His Leu Ala Gly Phe Phe Ser Ala Gly 
15 10 15 
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Asn Val Thr His Ala His Gly Ala Trp Arg His Thr Asp Ala Ser Asn 
20 25 30 

Asp Phe Leu Ser Gly Lys Tyr Tyr Gin His He Ala Arg Thr Leu Glu 
35 40 45 

Arg Gly Lys Phe Asp Leu Leu Phe Leu Pro Asp Gly Leu Ala Val Glu 
50 55 60 

Asp Ser Tyr Gly Asp Asn Leu Asp Thr Gly Val Gly Leu Gly Gly Gin 
65 70 75 80 

Gly Ala Val Ala Leu Glu Pro Ala Ser Val Val Ala Thr Met Ala Ala 
85 90 95 

Val Thr Glu His Leu Gly Leu Gly Ala Thr He Ser Ala Thr Tyr Tyr 
100 105 110 

Pro Pro Tyr His Val Ala Arg Val Phe Ala Thr Leu Asp Gin Leu Ser 
115 120 125 

Gly Gly Arg Val Ser Trp Asn Val Val Thr Ser Leu Asn Asp Ala Glu 
130 135 140 

Ala Arg Asn Phe Gly He Asn Gin His Leu Glu His Asp Ala Arg Tyr 
145 150 155 160 

Asp Arg Ala Asp Glu Phe Leu Glu Ala Val Lys Lys Leu Trp Asn Ser 
165 170 175 

Trp Asp Glu Asp Ala Leu Val Leu Asp Lys Ala Ala Gly Val Phe Ala 
180 185 190 

Asp Pro Ala Lys Val His Tyr Val Asp His His Gly Glu Trp Leu Asn 
195 200 205 

Val Arg Gly Pro Leu Gin Val Pro Arg Ser Pro Gin Gly Glu Pro Val 
210 215 220 

He Leu Gin Ala Gly Leu Ser Pro Arg Gly Arg Arg Phe Ala Gly Lys 
225 230 235 240 

Trp Ala Glu Ala Val Phe Ser Leu Ala Pro Asn Leu Glu Val Met Gin 
245 250 255 

Ala Thr Tyr Gin Gly He Lys Ala Glu Val Asp Ala Ala Gly Arg Asp 
260 265 270 

Pro Asp Gin Thr Lys He Phe Thr Ala Val Met Pro Val Leu Gly Glu 
275 280 285 

Ser Gin Ala Val Ala Gin Glu Arg Leu Glu Tyr Leu Asn Ser Leu Val 
290 295 300 




His Pro Glu Val 
305 

Leu Ala Ala Tyr 



Leu Gin Asp Arg 
340 

Thr His Ser Glu 
355 

Thr Asn Val Gly 
370 

Ala Asp Glu Leu 
385 

lie lie Ser Pro 



Gin Val Val Pro 
420 

Gin Gly Asn Thr 
435 




Gly Leu Ser Thr 
310 

Pro Leu Asp Thr 
325 

Asn Val Pro Thr 



Glu Leu Thr Leu 
360 

Phe Val Pro Gin 
375 

lie Arg His Phe 
390 

Ala Phe Leu Pro 
405 

Val Leu Gin Asp 



Leu Arg Asp His 
440 
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Leu Ser Ser His 
315 

Pro lie Lys Asp 
330 

Gin Leu His Met 
345 

Ala Glu Met Gly 



Trp Ala Gly Thr 
380 

Glu Gly Gly Ala 
395 

Gly Ser Tyr Asp 
410 

Arg Gly Tyr Phe 
425 

Leu Gly Leu Arg 




Thr Gly lie Asn 
320 

lie Leu Arg Asp 
335 

Phe Ala Ala Ala 
350 

Arg Arg Tyr Gly 
365 

Gly Glu Gin He 



Ala Asp Gly Phe 
400 

Glu Phe Val Asp 
415 

Arg Thr Glu Tyr 
430 

Val Pro Gin Leu 
445 



Gin Gly Gin Pro Ser 
450 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 65 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Thr Ser Arg Val Asp Pro Ala Asn Pro Gly Ser Glu Leu Asp Ser 
15 10 15 

Ala He Arg Asp Thr Leu Thr Tyr Ser Asn Cys Pro Val Pro Asn Ala 
20 25 30 

Leu Leu Thr Ala Ser Glu Ser Gly Phe Leu Asp Ala Ala Gly He Glu 
35 40 45 



Leu Asp Val Leu Ser Gly Gin Gin Gly Thr Val His Phe Thr Tyr Asp 
50 55 60 
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Gln Pro Ala Tyr Thr Arg Phe Gly Gly Glu lie Pro Pro Leu Leu Ser 
65 70 75 80 

Glu Gly Leu Arg Ala Pro Gly Arg Thr Arg Leu Leu Gly lie Thr Pro 
85 90 95 

Leu Leu Gly Arg Gin Gly Phe Phe Val Arg Asp Asp Ser Pro lie Thr 
100 105 110 

Ala Ala Ala Asp Leu Ala Gly Arg Arg lie Gly Val Ser Ala Ser Ala 
115 120 125 

lie Arg lie Leu Arg Gly Gin Leu Gly Asp Tyr Leu Glu Leu Asp Pro 
130 135 140 

Trp Arg Gin Thr Leu Val Ala Leu Gly Ser Trp Glu Ala Arg Ala Leu 
145 150 155 160 

Leu His Thr Leu Glu His Gly Glu Leu Gly Val Asp Asp Val Glu Leu 
165 170 175 

Val Pro lie Ser Ser Pro Gly Val Asp Val Pro Ala Glu Gin Leu Glu 
180 185 190 

Glu Ser Ala Thr Val Lys Gly Ala Asp Leu Phe Pro Asp Val Ala Arg 
195 200 205 

Gly Gin Ala Ala Val Leu Ala Ser Gly Asp Val Asp Ala Leu Tyr Ser 
210 215 220 

Trp Leu Pro Trp Ala Gly Glu Leu Gin Ala Thr Gly Ala Arg Pro Val 
225 230 235 240 

Val Asp Leu Gly Leu Asp Glu Arg Asn Ala Tyr Ala Ser Val Trp Thr 
245 250 255 

Val Ser Ser Gly Leu Val Arg Gin Arg Pro Gly Leu Val Gin Arg Leu 
260 265 270 

Val Asp Ala Ala Val Asp Ala Gly Leu Trp Ala Arg Asp His Ser Asp 
275 280 285 

Ala Val Thr Ser Leu His Ala Ala Asn Leu Gly Val Ser Thr Gly Ala 
290 295 300 

Val Gly Gin Gly Phe Gly Ala Asp Phe Gin Gin Arg Leu Val Pro Arg 
305 310 315 320 

Leu Asp His Asp Ala Leu Ala Leu Leu Glu Arg Thr Gin Gin Phe Leu 
325 330 335 



Leu Thr Asn Asn Leu Leu Gin Glu Pro Val Ala Leu Asp Gin Trp Ala 
340 345 350 
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Ala Pro Glu Phe Leu Asn Asn Ser Leu Asn Arg His Arg 
355 360 365 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 417 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Thr Leu Ser Pro Glu Lys Gin His Val Arg Pro Arg Asp Ala Ala 
15 10 15 

Asp Asn Asp Pro Val Ala Val Ala Arg Gly Leu Ala Glu Lys Trp Arg 
20 25 30 

Ala Thr Ala Val Glu Arg Asp Arg Ala Gly Gly Ser Ala Thr Ala Glu 
35 40 45 

Arg Glu Asp Leu Arg Ala Ser Ala Leu Leu Ser Leu Leu Val Pro Arg 
50 55 60 

Glu Tyr Gly Gly Trp Gly Ala Asp Trp Pro Thr Ala lie Glu Val Val 
65 70 75 80 

Arg Glu lie Ala Ala Ala Asp Gly Ser Leu Gly His Leu Phe Gly Tyr 
85 90 95 

His Leu Thr Asn Ala Pro Met lie Glu Leu lie Gly Ser Gin Glu Gin 
100 105 110 

Glu Glu His Leu Tyr Thr Gin lie Ala Gin Asn Asn Trp Trp Thr Gly 
115 120 125 

Asn Ala Ser Ser Glu Asn Asn Ser His Val Leu Asp Trp Lys Val Ser 
130 135 140 

Ala Thr Pro Thr Glu Asp Gly Gly Tyr Val Leu Asn Gly Thr Lys His 
145 150 155 160 

Phe Cys Ser Gly Ala Lys Gly Ser Asp Leu Leu Phe Val Phe Gly Val 
165 170 175 

Val Gin Asp Asp Ser Pro Gin Gin Gly Ala lie lie Ala Ala Ala lie 
180 185 190 



Pro Thr Ser Arg Ala Gly Val Thr Pro Asn Asp Asp Trp Ala Ala lie 
195 200 205 
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Gly Met Arg Gin Thr Asp Ser Gly Ser Thr Asp Phe His Asn Val Lys 
210 215 220 

Val Glu Pro Asp Glu Val Leu Gly Ala Pro Asn Ala Phe Val Leu Ala 
225 230 235 240 

Phe lie Gin Ser Glu Arg Gly Ser Leu Phe Ala Pro lie Ala Gin Leu 
245 250 255 

lie Phe Ala Asn Val Tyr Leu Gly lie Ala His Gly Ala Leu Asp Ala 
260 265 270 

Ala Arg Glu Tyr Thr Arg Thr Gin Ala Arg Pro Trp Thr Pro Ala Gly 
275 280 285 

He Gin Gin Ala Thr Glu Asp Pro Tyr Thr He Arg Ser Tyr Gly Glu 
290 295 300 

Phe Thr He Ala Leu Gin Gly Ala Asp Ala Ala Ala Arg Glu Ala Ala 
305 310 315 320 

His Leu Leu Gin Thr Val Trp Asp Lys Gly Asp Ala Leu Thr Pro Glu 
325 330 335 

Asp Arg Gly Glu Leu Met Val Lys Val Ser Gly Val Lys Ala Leu Ala 
340 345 350 

Thr Asn Ala Ala Leu Asn He Ser Ser Gly Val Phe Glu Val He Gly 
355 360 365 

Ala Arg Gly Thr His Pro Arg Tyr Gly Phe Asp Arg Phe Trp Arg Asn 
370 375 380 

Val Arg Thr His Ser Leu His Asp Pro Val Ser Tyr Lys He Ala Asp 
385 390 395 . 400 

Val Gly Lys His Thr Leu Asn Gly Gin Tyr Pro He Pro Gly Phe Thr 
405 410 415 

Ser 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4144 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

GGTTCGAGAT CGATCTGACC GTCGAACCCG GCGCGGTTCA AACCATCCTC TGGGGCCTCT 60 

TCTTGCACTT GACATAGGAA TCTCTACTAA ATAAATAGAT ATTTATTCGA CACTAAGTTC 12 0 

GGTGATCAGG CCGACCGTGT GTCTCAAGTG CTCGCTCCGG GTTGCCACGA GCTAAAGCGC 180 

GCGATGCTGG GGCGACAGCG CTAGGCATTG CGTTCCCTCA CACCAATGAT GAGATGATAC 240 

GATGCGCATG ACCACTATCC GCACCTAGCA CGAAAGATCC GTGCATTTCG CGAATGCCAA 3 00 

TGAAGAGGAC CGACGTACGG CAGCTTCCTA CGCTTTCGCG CCATCGTTCA TAGCCAAGGT 3 60 

CTTTTCGACG CCGGTTCGCG TGGGCGACTG ACGGCGGTAG CGCCGCGACT ATTCGTTTCA 42 0 

AACTCACGAG GATAAGAGCC TATGACCGAT CCACGTCAGC TGCACCTGGC CGGATTCTTC 4 80 

TGTGCCGGCA ACGTCACGCA CGCCCACGGA GCGTGGCGCC ACGCCGACGA CTCCAACGGC 54 0 

TTCCTCACCA AGGAGTACTA CCAGCAGATT GCCCGCACGC TCGAGCGCGG CAAGTTCGAC 60 0 

CTGCTGTTCC TTCCCGACGC GCTCGCCGTG TGGGACAGCT ACGGCGACAA TCTGGAGACC 660 

GGTCTGCGGT ATGGCGGGCA AGGCGCGGTG ATGCTGGAGC CCGGCGTAGT TATCGCCGCG 72 0 

ATGGCCTCGG TGACCGAACA TCTGGGGCTG GGCGCCACCA TTTCCACCAC CTACTACCCG 7 80 

CCCTACCATG TAGCCCGGGT CGTCGCTTCG CTGGACCAGC TGTCCTCCGG GCGAGTGTCG 840 

TGGAACGTGG TCACCTCGCT CAGCAATGCA GAGGCGCGCA ACTTCGGCTT CGATGAACAT 900 

CTCGACCACG ATGCCCGCTA CGATCGCGCC GATGAATTCC TCGAGGTCGT GCGCAAGCTC 960 

TGGAACAGCT GGGATCGCGA TGCGCTGACA CTCGACAAGG CAACCGGCCA GTTCGCCGAT 102 0 

CCGGCTAAGG TGCGCTACAT CGACCACCGC GGCGAATGGC TCAACGTACG CGGGCCGCTT 10 8 0 

CAGGTGCCGC GCTCCCCCCA GGGCGAGCCT GTCATTCTGC AGGCCGGGCT TTCGGCGCGG 114 0 

GGCAAGCGCT TCGCCGGGCG CTGGGCGGAC GCGGTGTTCA CGATTTCGCC CAATCTGGAC 12 00 

ATCATGCAGG CCACGTACCG CGACATAAAG GCGCAGGTCG AGGCCGCCGG ACGCGATCCC 12 60 

GAGCAGGTCA AGGTGTTTGC CGCGGTGATG CCGATCCTCG GCGAGACCGA GGCGATCGCC 132 0 

AGGCAGCGTC TCGAATACAT AAATTCGCTG GTGCATCCCG AAGTCGGGCT TTCTACGTTG 13 8 0 

TCCAGCCATG TCGGGGTCAA CCTTGCCGAC TATTCGCTCG ATACCCCGCT GACCGAGGTC 144 0 

CTGGGCGATC TCGCCCAGCG CAACGTGCCC ACCCAACTGG GCATGTTCGC CAGGATGTTG 150 0 




CAGGCCGAGA CGCTGACCGT GGGAGAAATG 
GTCCCGCAGT GGGCGGGAAC CCGCGAGCAG 
GCCGGCGGCG CCGATGGCTT CATCATCTCG 
TTCGTCGATC AGGTGGTGCC CATCCTGCAG 
GGCCGCACCC TGCGCAGCCA TCTGGGACTG 
TGACGACAGA CATCCACCCG GCGAGCGCCG 
CCTACAGCAA CTGCCCCGTG CCTAATGCCC 
ACAGTGCCGG GATCACACTT GCCCTGCTGA 
ACGACCGAGA TGACTACACC CGCTTCGGCG 
TGCGTGCGCC GGGGCGGACC CGCCTGCTGG 
ACTTCGTCCG GGGCGACAGC GCGATCCGCA 
GAGTATCCGA TTCGGCCAGG AGGATATTGA 
ATCCCTGGCG GCAGACCCTG GTCGCGCTGG 
CGCTCGAGAC GGCGGGGCTT GGCGTCGGCG 
TCGTCGACGT GCCGACCGAA CGACTGCATG 
TCCCCGACGT GACCAGCCAG CAGGCCGCAG 
TCGCGTGGCT TCCCTGGGCG GCCGAGCTCG 
ACCTCAGCGC AGACGACCGC AATGCCTATG 
TGGACCGGCA GCCCGAACTG GTGCAGCGGC 
GGGCCGAGGC CAATGGCGAT GTCGTCTCCC 
CCGAAAGCGT CCGCCAGGGA TTCGGAGCCG 
ACAGCGATGC TATCGCCATC CTGGAGCGTA 
TCGATCGGTC GTTGGCGCTC GATCGGTGGG 
CACGCCAGGT CGAAGGGCAG ATAGCATGAA 
ATCCGATCCG ATCGGCGCTG TGCGGCGACT 
TCGGGACCGG GCCGGCGGAT CGGCAACCGC 
GCTCTCGCTG TCCATTCCCG CCGCATATGG 

<J 1 
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GGCCGGCGTT ATGGCGCCAA CGTGGGCTTC 15 60 

ATCGCGGACC TGATCGAGAT CCATTTCAAG 162 0 

CCGGCGTTCC TGCCCGGATC TTACGAGGAA 16 80 

CACCGCGGAC TGTTCCGCAC TGATTACGAA 174 0 

CGTGAACCCG CATACCTGGG AGAGTACGCA 1800 

CATCGTCGCC GGCGGCGCGC GCGACGATCA 1860 

TGCTCGCCGC GCTCGGCTCA GGTATTCTGG 192 0 

CCGGAAAGCA GGGCGAGGTG CACTTCACCT 198 0 

GCGAGATTCC GCCGCTGGTC AGCGAGGGAC 2 04 0 

GACTGACGCC GGTGCTGGGC CGCTGGGGCT 2100 

CCCCGGCCGA TCTTGCCGGC CGCCGCGTCG 2160 

CCGGAAGGCT GGGCGACTAC CGCGAACTTG 2 220 

GGACATGGGA GGCGCGTGCC TTGCTGAGCA 22 8 0 

ACGTCGAGCT GACGCGCATC GAGAACCCGT 2 34 0 

CCGCCGGCTC GCTCAAAGGA ACCGACCTGT 24 00 

TCCTTGAGGA TGAGCGCGCC GACGCCCTGT 246 0 

AGACCCGCAT CGGTGCACGG CCGGTCCTAG 2 52 0 

CGAGCACCTG GACGGTGAGC GCCGAGCTGG 2580 

TCGTCGATGC CGTGGTGGAT GCAGGGCGGT 264 0 

GCCTGCACGC CGATAACCTC GGTGTCAGTC 27 00 

ATTTTCACCG CCGCCTGACG CCGCGGCTCG 2 760 

CTCAGCGGTT CCTGAAGGAT GCGAACCTGA 2 82 0 

CTGCACCTGA ATTCCTCGAA CAAAGTCTCT 2880 

CGAACTCGTC AAAGATCTCG GCCTCAATCG 2 94 0 

GGCCGCGCAG TGGGGGGCCA CCGCTGTTGA 3 000 

CGAACTCGAT CAACTGCGCG GCAGCGGCCT 3 06 0 

CGGCTGGGGC GCCGACTGGC CAACGACTCT 312 0 



• 
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GGAAGTTATC CGCGAAGTCG CAACGGTGGA CGGATCGCTG GCGCATCTAT TCGGCTACCA 318 0 

CCTCGGCTGC GTACCGATGA TCGAGCTGTT CGGCTCGGCG CCACAAAAGG AACGGCTGTA 324 0 

CCGCCAGATC GCAAGCCATG ATTGGCGGGT CGGGAATGCG TCGAGCGAAA ACAACAGCCA 3 3 00 

CGTGCTCGAG TGGAAGCTTG CCGCCACCGC CGTCGATGAT GGCGGGTTCG TCCTCAACGG 3 3 60 

CGCGAAGCAC TTCTGCAGCG GCGCCAAAAG CTCCGACCTG CTCATCGTGT TCGGCGTGAT 342 0 

CCAGGACGAA TCCCCCCTGC GCGGCGCGAT CATCACCGCG GTCATTCCCA CCGACCGGGC 34 80 

CGGTGTTCAG ATCAATGACG ACTGGCGCGC AATCGGGATG CGCCAGACCG ACAGCGGCAG 3 54 0 

CGCCGAATTT CGCGACGTCC GAGTCTACCC AGACGAGATC TTGGGGGCAC CAAACTCAGT 3 60 0 

CGTTGAGGCG TTCGTGACAA GCAACCGCGG CAGCCTGTGG ACGCCGGCGA TTCAGTCGAT 3 660 

CTTCTCGAAC GTTTATCTGG GGCTCGCGCG TGGCGCGCTC GAGGCGGCAG CGGATTACAC 3 72 0 

CCGGACCCAG AGCCGCCCCT GGACACCCGC CGGCGTGGCG AAGGCGACAG AGGATCCCCA 3 78 0 

CATCATCGCC ACCTACGGTG AACTGGCGAT CGCGCTCCAG GGCGCCGAGG CGGCCGCGCG 3 84 0 

CGAGGTCGCG GCCCTGTTGC AACAGGCGTG GGACAAGGGC GATGCGGTGA CGCCCGAAGA 3 900 

GCGCGGCCAG CTGATGGTGA AGGTTTCGGG TGTGAAGGCC CTCTCGACGA AGGCCGCCCT 3 960 

CGACATCACC AGCCGTATTT TCGAGACAAC GGGCTCGCGA TCGACGCATC CCAGATACGG 4 02 0. 

ATTCGATCGG TTCTGGCGTA ACATCCGGAC TCATACGCTG CACGATCCGG TATCGTATAA 4 0 80 

AATCGTCGAT GTGGGGAACT ACACGCTCAA CGGGACATTC CCGGTTCCCG GATTTACGTC 414 0 

ATGA 4144 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4144 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
TCATGACGTA AATCCGGGAA CCGGGAATGT CCCGTTGAGC GTGTAGTTCC CCACATCGAC 60 
GATTTTATAC GATACCGGAT CGTGCAGCGT ATGAGTCCGG ATGTTACGCC AGAACCGATC 12 0 
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GAATCCGTAT CTGGGATGCG TCGATCGCGA GCCCGTTGTC TCGAAAATAC GGCTGGTGAT 18 0 

GTCGAGGGCG GCCTTCGTCG AGAGGGCCTT CACACCCGAA ACCTTCACCA TCAGCTGGCC 24 0 

GCGCTCTTCG GGCGTCACCG CATCGCCCTT GTCCCACGCC TGTTGCAACA GGGCCGCGAC 300 

CTCGCGCGCG GCCGCCTCGG CGCCCTGGAG CGCGATCGCC AGTTCACCGT AGGTGGCGAT 3 60 

GATGTGGGGA TCCTCTGTCG CCTTCGCCAC GCCGGCGGGT GTCCAGGGGC GGCTCTGGGT 42 0 

CCGGGTGTAA TCCGCTGCCG CCTCGAGCGC GCCACGCGCG AGCCCCAGAT AAACGTTCGA 4 80 

GAAGATCGAC TGAATCGCCG GCGTCCACAG GCTGCCGCGG TTGCTTGTCA CGAACGCCTC 54 0 

AACGACTGAG TTTGGTGCCC CCAAGATCTC GTCTGGGTAG ACTCGGACGT CGCGAAATTC 600 

GGCGCTGCCG CTGTCGGTCT GGCGCATCCC GATTGCGCGC CAGTCGTCAT TGATCTGAAC 660 

ACCGGCCCGG TCGGTGGGAA TGACCGCGGT GATGATCGCG CCGCGCAGGG GGGATTCGTC 72 0 

CTGGATCACG CCGAACACGA TGAGCAGGTC GGAGCTTTTG GCGCCGCTGC AGAAGTGCTT 78 0 

CGCGCCGTTG AGGACGAACC CGCCATCATC GACGGCGGTG GCGGCAAGCT TCCACTCGAG 84 0 

CACGTGGCTG TTGTTTTCGC TCGACGCATT CCCGACCCGC CAATCATGGC TTGCGATCTG 900 

GCGGTACAGC CGTTCCTTTT GTGGCGCCGA GCCGAACAGC TCGATCATCG GTACGCAGCC 960 

GAGGTGGTAG CCGAATAGAT GCGCCAGCGA TCCGTCCACC GTTGCGACTT CGCGGATAAC 102 0 

TTCCAGAGTC GTTGGCCAGT CGGCGCCCCA GCCGCCATAT GCGGCGGGAA TGGACAGCGA 108 0 

GAGCAGGCCG CTGCCGCGCA GTTGATCGAG TTCGGCGGTT GCCGATCCGC CGGCCCGGTC 114 0 

CCGATCAACA GCGGTGGCCC CCCACTGCGC GGCCAGTCGC CGCACAGCGC CGATCGGATC 12 00 

GGATCGATTG AGGC CGAGAT CTTTGACGAG TTCGTTCATG CTATCTGCCC TTCGACCTGG 1260 

CGTGAGAGAC TTTGTTCGAG GAATTCAGGT GCAGCCCACC GATCGAGCGC CAACGACCGA 132 0 

TCGATCAGGT TCGCATCCTT CAGGAACCGC TGAGTACGCT CCAGGATGGC GATAGCATCG 13 8 0 

CTGTCGAGCC GCGGCGTCAG GCGGCGGTGA AAATCGGCTC CGAATCCCTG GCGGACGCTT 144 0 

TCGGGACTGA CACCGAGGTT ATCGGCGTGC AGGCGGGAGA CGACATCGCC ATTGGCCTCG 1500 

GCCCACCGCC CTGCATCCAC CACGGCATCG ACGAGCCGCT GCACCAGTTC GGGCTGCCGG 1560 

TCCACCAGCT CGGCGCTCAC CGTCCAGGTG CTCGCATAGG CATTGCGGTC GTCTGCGCTG 162 0 

AGGTCTAGGA CCGGCCGTGC ACCGATGCGG GTCTCGAGCT CGGCCGCCCA GGGAAGCCAC 168 0 

GCGAACAGGG CGTCGGCGCG CTCATCCTCA AGGACTGCGG CCTGCTGGCT GGTCACGTCG 1740 
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GGGAACAGGT CGGTTCCTTT GAGCGAGCCG GCGGCATGCA GTCGTTCGGT CGGCACGTCG 1800 

ACGAACGGGT TCTCGATGCG CGTCAGCTCG ACGTCGCCGA CGCCAAGCCC CGCCGTCTCG 18 60 

AGCGTGCTCA GCAAGGCACG CGCCTCCCAT GTCCCCAGCG CGACCAGGGT CTGCCGCCAG 192 0 

GGATCAAGTT CGCGGTAGTC GCCCAGCCTT CCGGTCAATA TCCTCCTGGC CGAATCGGAT 1980 

ACTCCGACGC GGCGGCCGGC AAGATCGGCC GGGGTGCGGA TCGCGCTGTC GCCCCGGACG 2 04 0 

AAGTAGCCCC AGCGGCCCAG CACCGGCGTC AGTCCCAGCA GGCGGGTCCG CCCCGGCGCA 2100 

CGCAGTCCCT CGCTGACCAG CGGCGGAATC TCGCCGCCGA AGCGGGTGTA GTCATCTCGG 2160 

TCGTAGGTGA AGTGCACCTC GCCCTGCTTT CCGGTCAGCA GGGCAAGTGT GATCCCGGCA 222 0 

CTGTCCAGAA TACCTGAGCC GAGCGCGGCG AG C AGGG CAT TAGGCACGGG GCAGTTGCTG 22 8 0 

TAGGTGATCG TCGCGCGCGC CGCCGGCGAC GATGCGGCGC TCGCCGGGTG GATGTCTGTC 2 34 0 

GTCATGCGTA CTCTCCCAGG TATGCGGGTT CACGCAGTCC CAGATGGCTG CGCAGGGTGC 24 00 

GGCCTTCGTA ATCAGTGCGG AACAGTCCGC GGTGCTGCAG GATGGGCACC ACCTGATCGA 24 60 

CGAATTCCTC GTAAGATCCG GGCAGGAACG CCGGCGAGAT GATGAAGCCA TCGGCGCCGC 252 0 

CGGCCTTGAA ATGGATCTCG ATCAGGTCCG CGATCTGCTC GCGGGTTCCC GCCCACTGCG 258 0 

GGACGAAGCC CACGTTGGCG CCATAACGCC GGCCCATTTC TCCCACGGTC AGCGTCTCGG 264 0 

CCTGCAACAT CCTGGCGAAC ATGCCCAGTT GGGTGGGCAC GTTGCGCTGG GCGAGATCGC 270 0 

CCAGGACCTC GGTCAGCGGG GTATCGAGCG AATAGTCGGC AAGGTTGACC CCGACATGGC 2760 

TGGACAACGT AGAAAGCCCG ACTTCGGGAT GCACCAGCGA ATTTATGTAT TCGAGACGCT 282 0 

GCCTGGCGAT CGCCTCGGTC TCGCCGAGGA TCGGCATCAC CGCGGCAAAC ACCTTGACCT 2 88 0 

GCTCGGGATC GCGTCCGGCG GCCTCGACCT GCGCCTTTAT GTCGCGGTAC GTGGCCTGCA 2 940 

TGATGTCCAG ATTGGGCGAA ATCGTGAACA CCGCGTCCGC CCAGCGCCCG GCGAAGCGCT 3000 

TGCCCCGCGC CGAAAGCCCG GCCTGCAGAA TGACAGGCTC GCCCTGGGGG GAGCGCGGCA 3060 

CCTGAAGCGG CCCGCGTACG TTGAGCCATT CGCCGCGGTG GTCGATGTAG CGCACCTTAG 312 0 

CCGGATCGGC GAACTGGCCG GTTGCCTTGT CGAGTGTCAG CGCATCGCGA TCCCAGCTGT 3180 

TCCAGAGCTT GCGCACGACC TCGAGGAATT CATCGGCGCG ATCGTAGCGG GCATCGTGGT 324 0 

CGAGATGTTC ATCGAAGCCG AAGTTGCGCG CCTCTGCATT GCTGAGCGAG GTGACCACGT 33 00 

TCCACGACAC TCGCCCGGAG GACAGCTGGT CCAGCGAAGC GACGACCCGG GCTACATGGT 3 3 60 




I * 
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AGGGCGGGTA GTAGGTGGTG GAAATGGTGG CGCCCAGCCC CAGATGTTCG GTCACCGAGG 342 0 

CCATCGCGGC GATAACTACG CCGGGCTCCA GCATCACCGC GCCTTGCCCG CCATACCGCA 34 80 

GACCGGTCTC CAGATTGTCG CCGTAGCTGT CCCACACGGC GAGCGCGTCG GGAAGGAACA 354 0 

GCAGGTCGAA CTTGCCGCGC TCGAGCGTGC GGGCAATCTG CTGGTAGTAC TCCTTGGTGA 360 0 

GGAAGCCGTT GGAGTCGTCG GCGTGGCGCC ACGCTCCGTG GGCGTGCGTG ACGTTGCCGG 3 660 

CACAGAAGAA TCCGGCCAGG TGCAGCTGAC GTGGATCGGT CATAGGCTCT TATCCTCGTG 3 72 0 

AGTTTGAAAC GAATAGTCGC GGCGCTACCG CCGTCAGTCG CCCACGCGAA CCGGCGTCGA 3 7 80 

AAAGACCTTG GCTATGAACG ATGGCGCGAA AGCGTAGGAA GCTGCCGTAC GTCGGTCCTC 3 84 0 

TTCATTGGCA TTCGCGAAAT GCACGGATCT TTCGTGCTAG GTGCGGATAG TGGTCATGCG 3 900 

CATCGTATCA TCTCATCATT GGTGTGAGGG AACGCAATGC CTAGCGCTGT CGCCCCAGCA 3 960 

TCGCGCGCTT TAGCTCGTGG CAACCCGGAG CGAGCACTTG AGACACACGG TCGGCCTGAT 4 02 0 

CACCGAACTT AGTGTCGAAT AAATATCTAT TTATTTAGTA GAGATTCCTA TGTCAAGTGC 4 080 

AAGAAGAGGC CCCAGAGGAT GGTTTGAACC GCGCCGGGTT CGACGGTCAG ATCGATCTCG 414 0 

AACC 4144 



